<
information science> A measure of how closely a given object
(file,
web page, database
record, etc.) matches a user's
search for information.
The
relevance algorithms used in most large web {search
engines} today are based on fairly simple word-occurence
measurement: if the word "daffodil" occurs on a given page,
then that page is considered relevant to a
query on the word
"daffodil"; and its
relevance is quantised as a factor of the
number of times the word occurs in the page, on whether
"daffodil" occurs in title of the page or in its META
keywords, in the first
N words of the page, in a heading,
and so on; and similarly for words that a
stemmer says are
based on "daffodil".
More elaborate (and resource-expensive)
relevance algorithms
may involve thesaurus (or
synonym ring) lookup; e.g. it
might rank a document about narcissuses (but which may not
mention the word "daffodil" anywhere) as relevant to a query
on "daffodil", since narcissuses and daffodils are basically
the same thing. Ditto for queries on "jail" and "gaol", etc.
More elaborate forms of thesaurus lookup may involve
multilingual thesauri (e.g. knowing that documents in Japanese
which mention the Japanese word for "narcissus" are relevant
to your search on "narcissus"), or may involve thesauri (often
auto-generated) based not on equivalence of meaning, but on
word-proximity, such that "bulb" or "bloom" may be in the
thesaurus entry for "daffodil".
Word spamming essentially attempts to falsely increase a web
page's
relevance to certain common searches.
See also
subject index.
(1997-04-09)